Dtree: Dynamic Task Scheduling at Petascale
نویسندگان
چکیده
Irregular applications are challenging to scale on supercomputers due to the difficulty of balancing load across large numbers of nodes. This challenge is exacerbated by the increasing heterogeneity of modern supercomputers in which nodes often contain multiple processors and coprocessors operating at different speeds, and with differing core and thread counts. We present Dtree, a dynamic task scheduler designed to address this challenge. Dtree shows close to optimal results for a class of HPC applications, improving time-to-solution by achieving nearperfect load balance while consuming negligible resources. We demonstrate Dtree’s effectiveness on up to 77,824 heterogeneous cores of the TACC Stampede supercomputer with two different petascale HPC applications: ParaBLe, which performs large-scale Bayesian network structure learning, and GTFock, which implements Fock matrix construction, an essential and expensive step in quantum chemistry codes. For ParaBLe, we show improved performance while eliminating the complexity of managing heterogeneity. For GTFock, we match the most recently published performance without using any application-specific optimizations for data access patterns (such as the task distribution design for communication reduction) that enabled that performance. We also show that Dtree can distribute from tens of thousands to hundreds of millions of irregular tasks across up to 1024 nodes with minimal overhead, while balancing load to within 2% of optimal.
منابع مشابه
Dynamic Task Scheduling for Scalable Parallel AMR in the Uintah Framework
Uintah is a computational framework for fluid-structure interaction problems using a combination of adaptive mesh refinement(AMR) and MPM particle methods. Uintah uses domain decomposition and a task graph based approach for asynchronous communication and automatic message combination . The original task scheduler for Uintah ran computational tasks in a predefined order. To improve the performa...
متن کاملSimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascales
Exascale computers will enable the unraveling of significant scientific mysteries. Predictions are that by 2019, supercomputers will reach exascales with millions of nodes and billions of threads of execution. Many-task computing (MTC) is a new viable distributed paradigm for extreme-scale supercomputing. The MTC paradigm can address four of the five major challenges of exascale computing, name...
متن کاملPartitioned Parallel Job Scheduling for Extreme Scale Computing
Recent success in building petascale computing systems poses new challenges in job scheduling design to support cluster sizes that can execute up to two million concurrent tasks. We show that for these extreme scale clusters the resource demand at a centralized scheduler can exceed the capacity or limit the ability of the scheduler to perform well. This paper introduces partitioned scheduling, ...
متن کاملGreen Energy-aware task scheduling using the DVFS technique in Cloud Computing
Nowdays, energy consumption as a critical issue in distributed computing systems with high performance has become so green computing tries to energy consumption, carbon footprint and CO2 emissions in high performance computing systems (HPCs) such as clusters, Grid and Cloud that a large number of parallel. Reducing energy consumption for high end computing can bring various benefits such as red...
متن کاملBridging the Gaps between Many-task Computing and Supercomputers
Many Task Computing, an emerging programming paradigm on supercomputers, embraces many applications in such domains as biology, economics, and statistics, as well as data intensive computations and uncertainty quantification. Its high inter-task parallelism and intense data processing features place new challenges on the existing hardware-software stack on supercomputers. Those new challenges i...
متن کامل